ENVST 206: Environmental Data Science, Hamilton College

Instructions

There are 10 questions to this activity. Save your answers in word document that you will hand in on Blackboard using a .pdf or .docx extension. Save your file with your last name first in the filename to help me keep track of files for grading. Keep your script file in your GitHub folder and make sure that all changes are pushed to GitHub. Code for all questions should be clearly commented with the question number. You will include a link to your script file as a part of your final question in this activity. The assignment is worth 30 points. Include any plots you make for a question in your answers.

Learning objectives

1. Evaluate glacial recession and vegetation change in Glacier National Park

2. Learn to read in vector data in R

3. Summarize and visualize vector datasets

Introduction to the problem: glacial retreat and vegetation change

As air temperatures rise under global climate change, areas covered in snow and ice are melting. Glaciers form when more snow accumulates then melts or evaporates and the snow becomes compacted over time. Glaciers are defined as being persistent over time (don’t disappear seasonally or even every few years) and must be at least 0.1 km squared in area, and often take thousands of years to form. Glaciers often form in mountainous regions. Glacier National Park is located in Montana and known for sweeping views of glacier covered mountains. The glaciers in Glacier National Park formed at least 7,000 years ago. However, the glaciers in Glacier National Park began noticeably shrinking over the last century. There is concern about impacts on tourism, ecosystems, and hydrology as the glaciers disappear. As shown below, the receding glaciers are noticeable and changing the entire landscape.

Source: National Park Service

Glacial retreat will drastically affect the regional hydrology, vegetation, and the energy balance. For example, a surface with snow or a glacier reflects more incoming solar radiation off of the earth’s surface compared to bare rock and vegetation nearby. The loss of glaciers means that the land surface will experience even greater warming as more solar energy is absorbed. As glaciers retreat and overall snow and ice cover decreases in mountainous regions, vegetation can grow in newly formed bare surfaces. You can see in the picture above, there is grass growing in areas that were once covered in snow and ice. This growth of vegetation can drive additional change on the land surface and alter water, energy, and carbon cycling.

In this exercise, you’ll use data from the National Park Service that shows the footprint of 39 glaciers in 1966 and 2015 to show the change in glacier extent over recent decades.

Set up spatial packages

You will use a series of spatial packages. The rgdal package is good for handling spatial data types. sp has a lot of operations for working with vector data. raster is for working with raster data. Install the following packages:

install.packages(c("sp","rgdal","dplyr"))

Note that if you are using a mac, you may get some package installation errors. Please refer to the posted troubleshooting document to address these errors. These packages rely on a library in your operating system called gdal, but is is not always installed or easy for R to find on some versions of mac os.

#package for vector data
library(sp)
#package for reading in spatial data
library(rgdal)
#data manangement package
library(dplyr)

Reading in vector data

You will start by reading in the vector data of the glacier outlines. Shapefiles contain vector data and end with the .shp extension. Shapefiles aren’t actually a single file, but an assemblage of multiple files with the same name and a different extension (ex: .prj or .dbf). This helps convey all of the data for the shapefile.

These glaciers have been digitized by the National Park Service. In shapefiles, all metadata is contained in the .xml file. You can open it in a web browser to find more information. The GNP_glaciers file contains all four shapefiles for each year. Read them in using the readOGR function.

#read in shapefiles
#readOGR in rgdal does this
#glaciers in 1966
g1966 <- readOGR("/Users/hkropp/Google Drive/teaching/2020/Fall 2020/EnvDataSci/activity/data/activity 6/GNPglaciers/GNPglaciers_1966.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/hkropp/Google Drive/teaching/2020/Fall 2020/EnvDataSci/activity/data/activity 6/GNPglaciers/GNPglaciers_1966.shp", layer: "GNPglaciers_1966"
## with 39 features
## It has 13 fields
## Integer64 fields read as strings:  OBJECTID
#glaciers in 2015
g2015 <- readOGR("/Users/hkropp/Google Drive/teaching/2020/Fall 2020/EnvDataSci/activity/data/activity 6/GNPglaciers/GNPglaciers_2015.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/hkropp/Google Drive/teaching/2020/Fall 2020/EnvDataSci/activity/data/activity 6/GNPglaciers/GNPglaciers_2015.shp", layer: "GNPglaciers_2015"
## with 39 features
## It has 21 fields
## Integer64 fields read as strings:  OBJECTID Recno

Note there are 39 features (glaciers) for each shapefile and a table with information about each glacier that has 13 columns of data. Let’s investigate what the format of this data.

str(g2015)

There is a lot of information including the coordinates for the polygons and settings for drawing the polygons. Let’s take a look at what one of the glacier data look like. The shapefiles that we read in were automatically classified as spatial polygons. When you tell R to make a plot of the data, it will be recognized as making a map for a spatial dataset. Let’s look at the original glacier footprints from 1966.

#map the glaciers filling in the polygons with light blue and making the borders grey
plot(g1966, col = "lightblue2", border="grey50")

There are a few large glaciers, but many look like small dots on the landscape. It is hard to get more context for these glaciers without knowing the geography of Glacier National Park. We will use some satellite imagery to get a better context shortly.

One of the main features of a shapefile is that a table of data for each spatial feature can be stored along with the spatial features. You can preview the first 6 lines and columns of this data table using the head function.

#data stores all accompanying info/measurements for each spatial object
head(g2015@data)
##   OBJECTID Recno Year Src1_0822 Src1_nadir Src3_nadir Src2_nadir   Area2015
## 0        1  1055 2015 secondary    11.7240     0.0000     4.7346  736669.75
## 1        2  1202 2015   primary    27.0008     0.0000     0.0000  511589.79
## 2        3  1120 2015 secondary    11.7240     0.0000     4.7346   75562.60
## 3        4  1430 2015 secondary     0.0000    16.8225     0.0000 1498505.92
## 4        5  1041 2015   primary    11.7240     0.0000    26.2766   35298.01
## 5        6  1132 2015 secondary    11.7240     0.0000     4.7346  224773.89
##   Shape_leng  X_COORD Y_COORD                                            SOURCE
## 0   7741.335 268429.9 5425167 WorldView-01 Satellite imagery from Digital Globe
## 1   5184.650 295750.8 5413701 WorldView-01 Satellite imagery from Digital Globe
## 2   1255.828 268868.3 5421731 WorldView-01 Satellite imagery from Digital Globe
## 3  15290.128 303278.4 5385710 WorldView-01 Satellite imagery from Digital Globe
## 4   2259.001 273823.9 5427266 WorldView-01 Satellite imagery from Digital Globe
## 5   4090.754 276459.4 5419663 WorldView-01 Satellite imagery from Digital Globe
##             CLASSIFICA             OWNERSHIP SrcOth_nad SrcOth_dat
## 0 main body of glacier Glacier National Park     0.0000 9999/01/01
## 1 main body of glacier Glacier National Park    18.6183 2014/10/19
## 2 main body of glacier Glacier National Park     0.0000 9999/01/01
## 3 main body of glacier Glacier National Park     0.0000 9999/01/01
## 4 main body of glacier Glacier National Park     0.0000 9999/01/01
## 5 main body of glacier Glacier National Park     0.0000 9999/01/01
##                                                               COMMENT Src2_0912
## 0            used both 20150822 and 20150912 images for full coverage   primary
## 1  20150822 only 2015 image, but high nadir, also used 20141019 image       n/a
## 2             20150822 used where shading was heavy in 20150912 image   primary
## 3   used color 20150822 as secondary, some offset from 20150925 image       n/a
## 4 used image from 20150822 since later image had high off nadir angle secondary
## 5                                                          no comment   primary
##   Src3_0925          GLACNAME SOURCE_SCA
## 0       n/a   Agassiz Glacier    1:12000
## 1       n/a     Ahern Glacier    1:12000
## 2       n/a      Baby Glacier    1:12000
## 3   primary Blackfoot Glacier    1:12000
## 4       n/a   Boulder Glacier    1:12000
## 5       n/a    Carter Glacier    1:12000

You will notice here that I refer to components of the spatial dataset using the **@** symbol. This is a specific feature of spatial data in R. This table allows for data describing a spatial object to be stored. You can work with this table just like any other dataframe, but you will be able to display objects in it spatially. You can see that there are names for the glaciers, area, and other information about the location and features of the glaciers.

For a vector object, you can find the projection info in an object called proj4string.

g1966@proj4string
## CRS arguments:
##  +proj=utm +zone=12 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
## +no_defs

Before you get started, there’s one other issue. The glacier names in 2015 don’t match the rest of the shapefiles. Compare 2015 to 1966 for reference:

#check glacier names
g1966@data$GLACNAME

g2015@data$GLACNAME

You’ll notice Miche Wabun is missing Glacier in its name and North Swiftcurrent Glacier should be abbreviated as N.. Fix that so you can use the glacier names to match up the glaciers between both years.

#fix glacier name so that it is consistent with the entire time period
g2015@data$GLACNAME <- ifelse(g2015@data$GLACNAME == "North Swiftcurrent Glacier",
                              "N. Swiftcurrent Glacier",
                                          ifelse(   g2015@data$GLACNAME ==  "Miche Wabun", 
                                               "Miche Wabun Glacier",
                                                as.character(g2015@data$GLACNAME)))

Vizualizing glacial retreat

It’s hard to get a reference for the glaciers just from the polygon file. I’ll add a basemap of an image collected by the Landsat satellite on September 5, 2014 for reference in the activity. Note that this is a single snapshot of the land surface on September 5. Don’t forget that glaciers are well defined geologically. You may see snow surrounding a glacier in the image that is not necessarily part of the glacier. You can also see the edge of the city of Kalispell and several small towns in the lower left corner of the image. The spatial resolution of this imagery is 30m x 30m. This means that every pixel covers 30 x 30 meters of land. The images will provide some reference for the location. The landsat sensors measure the light that is reflected up to the top of the atmosphere. If we overlay the red, green, and blue visible bands of light, it will look just like a photograph of the earth surface. We won’t get into working with raster data until next exercise so I will show you an overlay with the glacier polygons so that you are not overwhelmed.

It is a little hard to see the details at the scale of the entire park. Let’s zoom in on the area with a few glaciers:

Vector data analysis: glacier retreat

Next let’s analyze the area of the glaciers between 1966 and 2015. There is an area column in the glacier data tables, and the metadata indicates that the area is in meters squared.

Note: that sometimes re-sizing plots with multiple sources of spatial data can have issues re-sizing all data layers. If it looks like the extent doesn’t match after re-sizing, rerun your code again after re-sizing. It also may take a minute for plots or raster data to render on your screen.

It will be helpful to join all of this data together into a table not associated with the shapefile so that you can analyze it. Here you will use the join from dplyr. Let’s also make a dataframe with only the information we are interested in.

#lets combine area, first work with a smaller data frame
gdf66 <- data.frame(GLACNAME = g1966@data$GLACNAME,
        area66 = g1966@data$Area1966)

gdf15 <- data.frame(GLACNAME = g2015@data$GLACNAME,
                    area15 = g2015@data$Area2015)

#join all data tables by glacier name

gAll <- full_join(gdf66,gdf15, by="GLACNAME")

Below is an image of the different type of joins from the dplyr documentation. Note that you used a full join. Below is an image of the dplyr cheatsheet section on joins. You can also find more documentation on mutating joins in dplyr here:http://lindsaydbrin.github.io/CREATE_R_Workshop/Lesson_-_dplyr_join.html

Finally let’s calculate the percent change in area across the glaciers.

#calculate the % change in area from 1966 to 2015
gAll$gdiff <- ((gAll$area66-gAll$area15)/gAll$area66)*100

Let’s make a map where you color code the % glacial loss between 2015 and 1966. There is a function in sp called spplot that will color code spatial data based on a column of data within the data table of the spatial object. First you will need to add the % change back into the g1966 dataframe. Then you can use the spplot function. This will make a map that fills in your polygons with a color representing the percentage of glacial loss.

#join data with the spatial data table and overwrite into spatial data table 
g1966@data <- left_join(g1966@data, gAll, by="GLACNAME")
#use spplot to shade polygons based on the % change of labels
#first argument is the spatial object
#second is the column in of data to display with the different colors
#add a title using main
#col changes the color of the borders. This argument sets them to transparent
spplot(g1966, "gdiff", main="% change in area", col="transparent")

In this last part of the exercise you will work on your interpretation skills and use your R coding knowledge to help you summarize your data. It is often helpful to include some basic summary statistics and describe ranges of data in your data interpretation. There is just one last piece of information that might be helpful. You can subset spatial data a lot like you would a dataframe. This is helpful becuase you can subset it using the logical conditions that you have been using so far. Here’s an example.

#look at the Vulture glacier in 1966
vulture66 <- g1966[g1966@data$GLACNAME == "Vulture Glacier",]
plot(vulture66, main = "Vulture Glacier in 1966", col="slategray")